Scale-space unsupervised cluster analysis
نویسنده
چکیده
Most scientific disciplinesgenerate experimental data from an observed system about which we have may have little understanding of the data generating function. It is attractive, therefore, for an analysis system to break a complex dataset intoa series of piecewise similar groups or structures, each of which may then be regarded as a separate data state, for example, thus reducing overall data complexity. Cluster analysis has a long and rich history and excellent reviews of many methods may be found in Jain and Dubes [10], Jain [9], Hartigan [8] and Everitt [4]. This paper presents a scale-space method of unsupervised clustering (the ‘optimal’ number of partitions is unknown a priori). Its performance is compared to that of a Gaussian-mixture model approach (GMM) using both maximum-likelihood and Kmeans algorithms. The multi-scale method may be seen as falling within the hierarchichal clustering genre or as a method of scale-space (multiresolution) parameter estimation. We show that the GMM fails for data sets which are not multivariate Gaussian whilst the scalespace method is considerably more robust. GMM Theory & Nomenclature We consider the case of a data set, X = fxig where X <d. Let the distribution of data in X form a probability density function denoted by p(X ). We may specify each GMM by a single parameter, K, which describes the ‘complexity’ of the model (the number of Gaussian kernels). If, furthermore, we assume, as is common practise, that the probability distribution of the within-model free parameters, specified by K , is dominated by a single, most probable, solution, KMP , then p(X ) =XK p(X j K; KMP )p(K; KMP ) (1) For ease of notation we assume that, for every model specified by K, dependence upon KMP is implied. Bayes’ theorem then states that (droppping the terms) p(K j X ) = p(X j K)p(K) p(X ) (2) where the evidence term, p(X ), is given by Equation (1). As the number of Gaussian kernels, K, specifies the number of data partitions, we may use p(K j X ) as a partition validation measure. Parameter Estimation For a Gaussian mixture model, each kernel is a multivariate Gaussian whose free parameters are completely specified by its prior, p(k), mean, k, and covariance matrix, k. Application of the maximum-likelihood (ML) approach gives a closed analytic form for these parameters, parameter solution estimates require a non-linear optimisation algorithm. Solutions may be estimated using batch updating algorithms [5, 22] or by means of iterative variants of the Expectation-Maximisation (EM) algorithm [3], full details of which may be found in [18, 21], for example. The K-means algorithm is a wellknown simplification of the ML approach and details are found in e.g. [5, 22]. GMM Cluster Validity The use of either the K-means or the ML approach imposes the implicit assumption of hyper-ellipsoidal clusters on the data model. Most cluster validity measures are based upon estimates of the kernel covariance matrices for a given model complexity (number of kernels in the GMM) [9]. This paper follows a proposal made in [5] to utilise the ‘fuzzy hypervolume’, V , of the data partitioning. For a GMM with K components this is given as VK = K Xk=1 j kj1=2 (3) If we allow VK to act as a penalty term, such that those data models with large values of VK have correspondingly low prior probabilities then ranking models according to their posterior on the data set becomes equivalent to ranking on a ‘likelihood density’ term, %(K) given by %(K) = p(X j K) VK (4) We note that the above formalism may be applied to both the ML and K-means partitioning methods.
منابع مشابه
Unsupervised Texture Segmentation on the Basis of Scale Space Features
A novel approach to unsupervised texture segmentation is presented, which is formulated as a combinatorial optimization problem known as sparse pairwise data clustering. Pairwise dissimi-larities between texture blocks are measured by scale space features, i.e., multi-resolution edges. These scale space features are computed by a Gabor lter bank tuned to spatial frequencies. To solve the data c...
متن کاملMorphological hat-transform scale spaces and their use in texture classification
In this paper we present a multi-scale morphological method for use in texture classification. A connected operator similar to the morphological hat-transform is defined, and two scale-space representations are built. The most important features are extracted from the scale spaces by unsupervised cluster analysis, and the resulting pattern vectors provide the input of a decision tree classifier...
متن کاملRough Fuzzy Computing for Unsupervised Image Segmentation
In this paper we consider the problem of unsupervised boundary localization in textured images, reporting a texture separation algorithm which extracts textural density gradients by a non-linear multiple scale-space analysis of the image. Texture boundaries are extracted by segmenting the images resulting from a multiscale fuzzy gradient operation applied to detail images. The segmentation stag...
متن کاملDiatom Contour Analysis Using Morphological Curvature Scale Spaces
A method for shape analysis of diatoms (single-cell algae with silica shells) based on extraction of features on the contour of the cells by multi-scale mathematical morphology is presented. After building a morphological contour curvature scale space, we present a method for extracting the most prominent features by unsupervised cluster analysis. The number of extracted features matches well w...
متن کاملUnsupervised Classification of X-Ray Mapping Images of Polished Sections
X-ray mapping images of polished sections are classified using two unsupervised clustering algorithms. The methods applied are the k-means algorithm and an extended spectral fuzzy c-means algorithm. The extentions include new types of memberships that are related to the contextual information. In addition to the traditional spectral membership we apply a spatial membership and a parental member...
متن کاملNonlinear Processing of Large Scale Satellite Images via Unsupervised Clustering and Image Segmentation
– For large scale satellite images, it is evitable that images will be affected by various uncertain factors, especially those from atmosphere. To minimize the impact of atmosphere medium dispersing, image segmentation is an essential procedure. As one of the most critical means of image processing and data analysis approach, segmentation is to classify an image into parts that have a strong co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996